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Abstract 

About 1 million people in the world die each year from diseases spread by mosquitoes, and understanding the mechanism of host 
identification by the mosquitoes through olfaction is at stake. The role of odorant binding proteins (OBPs) in the primary molecular 
events of olfaction in mosquitoes is becoming an important focus of biological research inthisarea. Here, we present a comprehensive 
comparative genomics study of OBPs in the three disease-transmitting mosquito species Anopheles gambiae, Aedes aegypti, and 
Culex quinguefasciatus starting with the identification of 1 10 new OBPs in these three genomes. We have characterized their 
genomic distribution and orthologous and phylogenetic relationships. The diversity and expansion observed with respect to the 
Aedes and Culex genomes suggests that the OBP gene family acquired functional diversity concurrently with functional constraints 
posed on these two species. Sequences with unique features have been characterized such as the "two-domain OBPs" (previously 
known as Atypical OBPs) and "MinusC OBPs" in mosquito genomes. The extensive comparative genomics featured in this work 
hence provides useful primary insights into the role of OBPs in the molecular adaptations of mosquito olfactory system and could 
provide more clues for the identification of potential targets for insect repellants and attractants. 

Key words: odorant binding proteins, OBP, mosquito, Culex quinguefasciatus, Aedes aegypti, Anopheles gambiae, olfaction, 
phytogeny. 



Introduction 

The spread of infectious diseases among humans is mediated 
primarily by the world's most dangerous animal, the mosqui- 
toes among which the anthrophilic mosquitoes such as 
Anopheles gambiae, Anopheles funestus, Aedes albopictus, 
Aedes aegypti, and Culex quinquefasciatus are the most ef- 
fective transmitters of viruses and parasites. They are respon- 
sible for the spread of a number of life-threatening diseases 
such as malaria, dengue, and West Nile encephalitis and re- 
cently Chikungunya with a lower mortality rate compared 
with the other diseases. According to the World Health 
Organization, global climate change is expanding mosquitoes 
range, heightening the risk of disease for millions of additional 
people. Primary prevention is one of the most important 



aspects to subside the spread of diseases either by controlling 
the population of these vectors or by preventing the inter- 
action between the vector and the host. 

Understanding the molecular mechanism for human host 
recognition mediated by olfaction would help in identifying 
new strategies for the prevention of the primary contact. 
Volatile products secreted by the human host in the process 
of metabolism are responsible for the attraction of these vec- 
tors to the host. The ability of recognizing and discriminating 
thousands of odorant molecules in insects as in mammals 
relies on specialized chemosensitive neural cells expressing ol- 
factory receptor proteins (ORs) which reside within segregated 
compartments called sensilla. Each sensillum is a hair-like 
structure bathed in the sensillum lymph which contains a 
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number of secreted proteins (McKenna et al. 1994; Pikielny 
et al. 1994; Wang et al. 1999). The odorant binding proteins 
(OBPs) are found to be important water-soluble components 
of this sensillum lymph. It was first identified in the moth as 
pheromone binding proteins (PBPs) (Vogt and Riddiford 
1981). These globular proteins are believed to bind different 
odorant molecules (Plettner et al. 2000), owing to their high 
divergence within the family, and transport them to their re- 
spective olfactory receptors triggering the mechanism of ol- 
faction (Pelosi and Maida 1995). 

The arthropod OBPs form a large specific multi-gene family. 
They are 10-30kDa globular and water-soluble proteins that 
are characterized by a specific six a-helical domain comprising 
of six highly conserved cysteines that have distinct disulphide 
connectivities. These structural features are now considered 
the hallmark of this protein family (Calvo et al. 2002; 
Valenzuela et al. 2002; Calvo et al. 2006). OBPs have 
been identified in a number of insect species, including 
four dipterian species Drosophila melanogaster (Galindo and 
Smith 2001; Graham and Davies 2002; Hekmat-Scafe et al. 
2002; Valenzuela et al. 2002; Zhou et al. 2004; Vieira et al. 
2007; Vieira and Rozas 2011), A. gambiae (Vogt 2002; 
Xu et al. 2003; Zhou et al. 2004; Li et al. 2005; Vieira 
and Rozas 2011), Aed. aegypti (Zhou et al. 2008), and 
C. quinquefasciatus (Pelletier and Leal 2009, 201 1 ). These pro- 
teins are very divergent in terms of the sequences within the 
family, and sequence identities between the family members 
from the different species could drop as low as 8% (Vieira and 
Rozas 201 1 ). In Drosophila, a subgroup of (i) OBPs lacking two 
of the six conserved cysteines, called MinusC OBPs and 
(ii) OBPs carrying additional conserved cysteines called PlusC 
OBPs have been identified (Hekmat-Scafe et al. 2002). The 
MinusC OBPs typically lack the second and fifth Cys residues. 
However, this definition appears to be somewhat ambiguous, 
since there are three Drosophila OBPs among this cluster 
which contain all the six hallmark cysteines (Pelosi and 
Maida 1995). MinusC OBPs have never been described to 
date in mosquito genomes. 

In mosquitoes, three subfamilies of OBP genes have been 
characterized so far: (i) the Classic OBPs that carry the six 
conserved cysteines characteristic motif of the OBP family; 
(ii) the PlusC OBPs that have the same conserved cysteines 
and disulphide connectivity but which contain six additional 
cysteines with novel disulphide connectivities; (iii) the Atypical 
OBPs that are among the longest known OBPs and that have 
initially been described as containing a single Classic OBP 
domain in its N-terminal extended by a less characterized 
C-terminal extension. Very recently, it was shown that 
Atypical OBPs comprises two domains that are in fact hom- 
ologous to the Classic OBP domain and were hence con- 
sidered as "dimer OBPs" (Vieira and Rozas 201 1). 

In A. gambiae and Aed. aegypti, OBPs from the three 
different subfamilies have been reported to date while in 
C. quinquefasciatus, only the Classic and PlusC members of 



this family have been reported so far (Pelletier and Leal 2009, 
2011). Atypical OBPs have not yet been reported in this 
genome. 

An additional multi-gene family, known as D7 salivary pro- 
teins, is known to be distantly related to the arthropod OBP 
superfamily (Calvo et al. 2002, 2006, 2009). There are two 
types of D7 salivary proteins in the mosquito genome, the 
short and the long forms which contain one and two 
OBP-like domains, respectively (Valenzuela et al. 2002; 
Kalume et al. 2005; Choumet et al. 2007). The available struc- 
tures of the D7 proteins indicate that the domains adopt a 
similar fold to the OBP domains but decorated with additional 
structural features and a seventh helix. In the two-domain D7 
protein, the C-terminal OBP-like domain has been shown to 
bind to biogenic amines in A. gambiae and Aed. aegypti 
(Mans et al. 2008; Calvo et al. 2009), while the N-terminal 
domain in Aed. aegypti was shown to have a specific bioactive 
lipid-binding activity (Calvo et al. 2009). These members serve 
as important representatives for the construction of phylogen- 
etic trees serving as outgroups for the OBP gene family in the 
current analysis. 

This work describes the identification and extension of 
OBPs in the mosquito genomes of A. gambiae, Aed. aegypti, 
and C. quinquefasciatus. We provide a significant extension of 
the OBP gene family to a total of 1 10 new members in these 
three genomes and report the presence of all three classes of 
OBPs in the three mosquito genomes. In particular, we identi- 
fied Atypical class of OBPs in C. quinquefasciatus. We further 
confirm that "Atypical OBPs" are composed of two domains 
that are homologous to Classic OBPs and provide in-depth 
characterization of their origin and structural features. This 
work also provides for a comprehensive and robust subclassi- 
fication of the different OBP classes through structure-based 
alignments and phylogenetic analysis which could possibly re- 
flect on the functional divergence of these proteins. We also 
provide a detailed primary structural and phylogenetic char- 
acterization of all these novel OBP subtypes. An extensive set 
of supplementary materials that detail our analyses and results 
are provided. 

Results 

Extension of OBPs Family in All Three Mosquito Genomes 

In the already published works, 65 OBPs from A. gambiae 
(Vogt 2002; Xu et al. 2003; Zhou et al. 2004, 2008), 64 
from Aedes aegyti (Zhou et al. 2008), and 53 OBPs from 
C. quinquefasciatus (Pelletier and Leal 2009) were previously 
identified. These OBPs have been characterized by these 
groups into three main subfamilies Classic, PlusC, and 
Atypical based on sequence features (fig. 1). Only very re- 
cently, Vieira and Rozas (2011) added four new putative 
genes to the A. gambiae OBP gene repertoire and 13 PlusC 
OBPs to the C. quinquefasciatus genome. These new genes 
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Table 1 

Identification of OBPs in Anopheles gambiae, Aedes aegypti, and 
Culex quinquefasciatus Genomes 

Subfamily 



Classic PlusC Atypical Not New 
Determined Total 



A. gambiae 

Previously reported 3 
Newly identified 

Aed. aegypti 

Previously reported 11 
Newly identified 

C. quinquefasciatus 
Previously reported c 
Newly identified 



29 



33 
6 



48 
21 



16 

4 



17 
10 



12 



16 



14 

31 



26 



69 



111 



109 



Note. — The table shows statistics of previously and newly identified OBP 
members (AgamOBP65 to AgamOBP68, AaegOBP67 to AaegOBP114, CquiOBP54 
to CquiOBP112) in all three mosquito genomes. Detailed results are provided in 
accompanying supplementary tables S1a-e, Supplementary Material online. 

a Vogt (2002); Xu et al. (2003); Zhou et al. (2004); and Vieira and Rozas (201 1). 

^hou et al. (2008) and Pelletier et al. (2009). 

c Pelletier et al. (2009). 



were also identified by our sequence searches and bioinfor- 
matics analysis (see Materials and Methods) (table 1 and sup- 
plementary table S1a-e, Supplementary Material online). The 
fasta sequences of the identified genes are available for down- 
load as supplementary material. 

In this study, a major expansion is provided in the Atypical 
OBP subfamily of the mosquitoes where 31 new members 
(AaegOBP84 to AaegOBP1 14) are identified in Aed. aegypti 
which interestingly show high sequence similarities with the 
26 (CquiOBP75-CquiOBP100) new Atypical members from 
the C. quinquefasciatus genome that are reported in this 
work (supplementary table S1c and d, Supplementary 
Material online). 

In the Classic OBP subfamily, we have annotated six new 
members in the Aed. aegypti genome and 21 members in the 
C. quinquefasciatus genome. In addition to this, 1 0 new mem- 
bers have been added to the PlusC subfamily of the Aed. 
aegypti genome which sums up to the addition of 1 10 mem- 
bers to the OBP gene family of mosquitoes [which includes 
sequences identified by Vieira and Rozas (201 1) and Pelletier 
and Leal (2011)]. 

Two-Domain OBPs and MinusC OBPs 

Owing to the low sequence identity and length variations 
observed between the members of the OBP family, a 
structure-based alignment was used to align them (see 
Materials and Methods). It highly improved the quality of 
alignment compared with regular multiple sequence align- 
ments namely for (i) the precise classification of the new 
OBPs into the three different subfamilies and (ii) the 



identification of residues in structurally conserved positions 
that would have been missed otherwise (supplementary fig. 
S3a-c, Supplementary Material online). 

The conservation pattern of cysteines across the different 
classes were clearly highlighted in these structure-based align- 
ments but could not be obtained otherwise with the ordinary 
sequence alignment methods. We further refer to the cysteine 
positions in this article by numbering them C1 to C6 with 
respect to the order of their positions in the Classic OBP pro- 
teins. A detailed schematic representation featuring the cyst- 
eine spacing and conservation together with their predicted 
disulphide patterns are given in figure 1 . Overall, the six cyst- 
eine residues involved in disulphide bond formation, which are 
considered as the hallmark of this protein family (Calvo et al. 
2002; Valenzuela et al. 2002; Calvo et al. 2006), are well 
conserved across the Classic, PlusC, and Atypical subclasses. 

Interestingly, sequences that lack C2 and C5 cysteines were 
observed in the alignments. OBPs which lack these two par- 
ticular cysteines, called the MinusC OBPs, have been charac- 
terized and expressed in other insect genomes such as 
Drosophila, Bombyx mori, Tribolium castaneum, and Apismel- 
lifera (Vieira and Rosas 201 1), but their presence in the mos- 
quito genome has not been shown previously. AaegOBP78 
from Aed. aegypti and 1 5 proteins from C. quinquefasciatus 
(CquiOBP59-CquiOBP62, CquiOBP64-CquiOBP74) were 
found to lack these two cysteines. As all these sequences 
retained the N-terminal signal peptide or the presence 
of the PBP/GOBP domain, they were retained in our analysis 
as MinusC OBPs (supplementary tables S3 and S4, 
Supplementary Material online). 

We also observed interesting cysteine conservation pat- 
terns among the Atypical OBPs. The Atypical OBPs were pre- 
viously described as proteins that hold a Classic OBP domain 
in the N-terminal end with an uncharacterized C-terminal 
domain. However, the close analysis of the extended 
C-terminal end of Atypical members highlighted the presence 
of six additional cysteines conserved within this subfamily, 
with a cysteine spacing pattern very similar to the conserved 
cysteines (C1-C6) at their N-terminal end. The observed cyst- 
eine conservation pattern in the case of the Atypcial OBPs is 
purely the reflection of the annotation of new members in this 
subfamily and has never been described before to our 
knowledge. We hence propose to annotate these cysteines 
as CV-C6'. This remarkable conservation of cysteines is 
believed to hold important evolutionary information 
(Thangudu et al. 2005, 2008). Following this, we character- 
ized the homologues of each of the two domains and identi- 
fied their closest classic OBP homologue in their corresponding 
genomes and also the Drosophila genome which confirms 
that the Atypical OBPs are indeed "two-domain OBPs." It is 
noteworthy that within the Atypical (two-domain) subfamily, 
a distinctive subtype called matype2 (see below and fig. 1) 
showed the presence of only six cysteines (C1, C3, C4, C6, 
C4', and C6'), when compared with the other subtypes which 
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carry the 12 cysteines. The Cys conservation pattern at the 
N-terminal domain of the OBP is similar to the MinusC OBPs; 
however, the C-terminal domain is found to have lost more 
cysteines comparatively. 

Analysis of OBP Genes: Orthology across the Three 
Genomes and Their Corresponding Distribution 

We investigated the orthology and gene distribution of OBPs 
in three genomes. Assembled genome is only available for 
A. gambiae at the date of this work in Ensembl Genomes 
and VectorBase 3.4 version. The chromosomal mapping for 
each of the OBP genes in Anopheles is hence known with 
precision (fig. 2). Their chromosomal distribution in the 
Anopheles genome is centrally featured in supplementary 
fig. S1a-e and further referenced in supplementary table 
S1a, Supplementary Material online. Though the syntenic re- 
lationship between the chromosome arms in A. gambiae and 
their corresponding orthologous chromosome arms in Culex 
and Aedes was established by Arensburger et al. (2010) with 
the help of genetic markers (supplementary table S2, 
Supplementary Material online), the genomic data of these 
two Culcinae species are only available in the form of super- 
contigs fragments (Nene et al. 2007; Arensburger et al. 2010) 
and are yet to be assembled. In these two genomes, a few 
supercontigs (about 10%) harbor markers that allow their 
chromosomal localization (Arensburger et al. 2010). Very 
few of these anchor supercontigs hosted OBP genes. Most 
supercontigs containing OBP genes did not harbor any gen- 
omic markers, hence cannot be assigned to a chromosome in 
Aedes and Culex. However, in many cases, direct orthologs in 
the Anopheles genome could be identified (fig. 2, supplemen- 
tary fig. S1a-e and supplementary table S1a, c, and e, 
Supplementary Material online). OBP orthologs have been 
identified using the reciprocal BLAST hit approach 
(Moreno-Hagelsieb and Latimer 2008) which is widely used 
in the detection of orthologs. As illustrated in figures 2 and 3 
and in supplementary figure S1a-e, Supplementary Material 
online, three-way orthology (1:1:1) between OBP genes in the 
three genomes were identified in 31 cases while two-way 
orthology (1:1) between OBP genes from only two genomes 
were identified in 5 cases between Anopheles and Culex, 6 
between Anopheles and Aedes, and 1 9 between /Aedes and 
Culex (fig. 3), thus confirming the genetic proximity between 
the Aed. aegypti and C. guinquefasciatus species. Our pro- 
posed analysis was found to be in complete agreement with 
the microsynteny analysis described very recently in Pelletier 
and Leal (2011), thus indicating that the orthology detected 
may serve as the basis of further syntenic analysis. 

Interestingly, the overwhelming majority of the OBP 
genes are organized in gene clusters in the three genomes 
(supplementary fig. S1a-e, Supplementary Material online). 
The clusters are mainly composed of gene duplicates. 
The genes in these genomic clusters hence share high 



sequence identity (data not shown) and are thereby phylogen- 
etically very close (see below) as it is confirmed by inparalogy 
data from the inParanoid database (O'Brien et al. 2005). The 
extension of OBP gene repertoire in Aed. aegypti and C. quin- 
guefasciatus with respect to A. gambiae was mainly driven by 
these gene duplication events which are more numerous in 
these two Culicinae species. There are a total of 12 OBP gene 
clusters \nAed. aegypti and 13 clusters in C. guinguefasciatus 
genomes when compared with 6 clusters in A. gambiae. The 
largest gene clusters are found in Aedes and Culex, and a few 
clusters contain as many as 12 genes. It is observed that 21 out 
of the 26 newly identified Atypical (two-domain) OBPs genes 
from C. guinquefasciatus are in fact distributed into three 
main gene clusters (fig. 2 and supplementary fig. S1a-e, 
Supplementary Material online). Similarly, 10 out of the 12 
newly identified PlusC proteins are distributed into three 
gene clusters. 

Phylogeny-Based OBP Clusters 

As expected and as already reported previously, OBP family 
members showed high divergence. The average sequence 
identity between OBP genes in A. gambiae, Aed. aegypti, 
and C. quinquefasciatus are 12.5%, 12.8%, and 13.1%, re- 
spectively, and their phylogenetic tree (see Material and 
Methods) also indicated a high sequence divergence (supple- 
mentary fig. S2a-c, Supplementary Material online). However, 
the comparative analysis of the different subfamilies of the 
OBPs in the mosquito genome provided more meaningful 
clustering patterns within each subfamily of the OBP mem- 
bers. The analysis was done based on the sequence alignment 
and phylogenetic trees constructed using sequences from in- 
dividual subfamilies from all the three mosquito genomes 
used in this analysis and the Drosophila OBPs (Hekmat-Scafe 
et al. 2002) in the case of the Classic members. A bootstrap 
consensus tree was constructed using the neighbor joining 
method (Saitou and Nei 1987) with all the Classic OBPs 
from the three mosquito genomes and the D. melanogaster 
with 1000 bootstrap replicates (fig. 4). The clustering of the 
various Classic OBPs into clusters based on significant boot- 
strap values (50% cutoff) revealed the possibility of 18 differ- 
ent subtypes. These clusters carried orthologous and 
paralogous sequences from the three genomes. Few mem- 
bers of the mosquito genomes clustered with Drosophila OBPs 
(Hekmat-Scafe et al. 2002), and these clusters were named 
after their closest Drosophila OBPs. Among these OS-E/OS-F, 
Pbprpl, LUSH, OBP19a, and Pbprp4 have already been 
described previously (Xu et al. 2003; Zhou et al. 2008; 
Pelletier and Leal 2009). However, one member from C. quin- 
quefasciatus in each of the two subtypes OS-E/OS-F 
(CquiOBP58) and OBP1 9a (CquiOBP57) have been annotated. 
The huge expansion of sequences (CquiOBP25-CquiOBP42) 
observed by Pelletier and Leal (2009) were found to be hom- 
ologous to AaegOBP57 and AgamOBP13 and were indeed 
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Fig. 3. — Analysis of orthologous OBP genes shared across three mosquito species, Anopheles gambiae, Aedes aegypti, and Culex quinquefastiatus. The 
Venn diagrams indicate the number of inferred orthologous genes shared among the mosquitoe species: (a) number of A. gambiae OBP genes orthologous 
to Aed. aegypti and C. quinquefasciatus; (b) number of Aed. aegypti OBP genes orthologous to A. gambiae and C. quinquefasciatus; (c) number of Culex OBP 
genes orthologous to A. gambiae and Aed. aegypti; (d) overall number of orthologous groups across the three mosquito species. The orthologs were 
identified using the reciprocal BLAST hit approach. The number of genes that share a three-way (1:1:1) orthology between the three species is 31. The 
number of genes in a species that have two-way orthology (1 :1) with the two other species but not a three-way orthology is indicated between parenthesis 
and for a given species should be counted only once. For example, in (a), the total number of OBP genes in A. gambiae is 30 + 3 + 2 + 31 + (3) = 69, since 
three genes in A. gambiae have two-way orthology (1:1) with genes in both C. quinquefasciatus and Aed. aegypti but not a three-way orthology. Detailed 
listings of the orthology analysis are provided in supplementary table S1a, c, and e, Supplementary Material online. 



closely related to the Pbprp2/Pbprp5 of Drosophila. 
CquiOBP55 and AaegOBP83 identified in this analysis are 
orthologs of AgamOBP29 and homologous to OBP59a of 
Drosophila and have an unusually long sequence as recently 
mentioned by Vieira and Rozas (2011). Clustering of three 
orthologous OBP sequences AgamOBP9, AaegOBP22, and 
CquiOBP43 with the Drosophila MinusC members OBP99a, 
OBP44a, and OBP99b was observed with a considerable boot- 
strap support, among which OBP99a alone retains all the six 
cysteines, while the two others lack the C2 and C5 cysteines 
(see below). Among the Drosophila MinusC OBPs, three mem- 
bers of the MinusC subfamily (Obp83f , Obp99a, and Obp99d) 
retain all six conserved cysteines, whereas four members of 
the subfamily (Obp8a, Obp44a, Obp99b, and Obp99c) have 
C2 and C5 cysteines lacking. Therefore, the mosquito OBPs, 
which cluster with these Drosophila OBPs, do not represent 



true MinusC OBPs. The other clusters which do not have a 
close Drosophila homologue are named as mclassicl-9 (fig. 4 
and supplementary fig. S3a, Supplementary Material online). 
In addition to these subtypes, one group, displaying out- 
standing sequence features (supplementary fig. S3a, 
Supplementary Material online) with 16 members lacking 
C2 and C5 cysteines, has been named as "Bombyx mori 
MinusC" due to their homology with the B. mori MinusC 
sequences though its branch holds a bootstrap value of only 
35%. This homology was determined using BLAST analysis 
and confirmed with the inParanoid eukaryotic ortholog data- 
base (O'Brien et al. 2005). Other subtype classifications of the 
Classic members were also similar to the clustering seen in the 
inParanoid database. 

As shown in figure 5, the PlusC OBPs clustered as seven 
major phylogenetic clusters based on bootstrap cutoff value 
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Fig. 4. — Unrooted phylogenetic tree of Classic OBPs in the three mosquito genomes and in Drosophila melanogaster. The Anopheles gambiae, Aedes 
Aegypti, and Culex quinquefasdatus members are colored in mustard, pink, and turquoise, respectively. The bootstrap values of the branches are indicated 
on the nodes in percentage values. The names of identified clusters inside the Classic OBPs subfamily are indicated on the branches. Detailed alignments of 
the members inside each cluster are provided in supplementary figure S3a, Supplementary Material online. 



of 50%, but we further subdivided them into 1 1 subtypes 
(mplusl-mplusl 1). Indeed, though the interior node of 
mplus7-1l cluster hold a bootstrap value of 57%, we 
separated them as different subtypes because they clearly 
hold distinct sequence features (supplementary fig. S3b, 
Supplementary Material online). Furthermore, analysis of 
chromosomal localization of PlusC members from A gambiae 
shows that mplusl 1 subtype members are specific to chromo- 
some 3L while all other PlusC OBPs were specifically distribu- 
ted on chromosome 2L. At this stage, it is difficult to interpret 
the molecular background behind this clustering. 



The Classic subfamily members from the three genomes 
share an average sequence identity of 1 5.5%, while the PlusC 
OBPs share 17.3% average sequence identity. No distinct se- 
quence features could be observed at the subfamily level 
(Classic, PlusC, and Atypical) because of high sequence diver- 
gence. Nevertheless, a close examination of the alignments for 
the different clusters which contain orthologous sequences 
from the three genomes within each subfamily indicates 
that the phylogenetic clusters established in this study tend 
to have specific sequence patterns (supplementary fig. S3a 
and b, Supplementary Material online). Some subgroups are 
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Fig. 5. — Unrooted phylogenetic tree of PlusC OBPs in the three mosquito genomes. The Anopheles gambiae, Aedes Aegypti, and Culex quinquefas- 
ciatus members are colored in mustard, pink, and turquoise, respectively. The bootstrap values of the branches are indicated on the nodes in percentage 
values. The names of identified clusters inside the PlusC OBPs subfamily are indicated on the branches. Detailed alignments of the members inside each 
cluster are provided in supplementary figure S3b, Supplementary Material online. 



characterized by a very low average sequence identity like the 
B. mori MinusC subgroup within the Classic OBPs (21.5%), 
the mclassicS (23.3%), or the mplus9 (24.3%), while other 
subgroups share significantly higher sequence identities like 
OS-E/OS-F (55.2%), Pbprp4 (60.2%), or mclassic4 (77.3%). 

Sequence Specific Clustering of Two-Domain OBPs 

The Atypical OBPs, unlike the Classic members, formed four 
major clusters based on bootstrap values which are named in 



this study matype1-matype4 (fig. 6) and showed distinct 
sequence features (supplementary fig. S3c, Supplementary 
Material online). The matypel forms the smallest cluster 
among the four subtypes with two members from each 
genome, and this cluster is separated from the other three 
subtypes with high bootstrap values. The matype2 forms a 
distinctive type of Atypical members holding only a total of 
six cysteines (C1, C3, C4, C5, C1', and C6') out of the 
12 conserved cysteines characteristic of the other subtypes 
of this subfamily (fig. 1 and supplementary fig. S3c, 
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Fig. 6. — Unrooted phylogenetic tree of Atypical odorant binding proteins in the three mosquito genomes. The Anopheles gambiae, Aedes Aegypti, and 
Culex quinquefasciatus members are colored in mustard, pink, and turquoise, respectively. The bootstrap values of the branches are indicated on the nodes in 
percentage values. The names of identified clusters inside the Atypical OBPs subfamily are indicated on the branches. Detailed alignments of the members 
inside each cluster are provided in supplementary figure S3c, Supplementary Material online. 



Supplementary Material online). The matype2 still fea- 
tures to stand as a distinctive type with the presence of cyst- 
eines in the N-terminal domain lacking C2-C5 as previously 
described. The matype4 members unanimously hold a 
deletion of about 15 resides between the C1 and C2 which 
stands as the distinguishing feature of this subtype. 
The matypel members are orthologous to AgamOBP39 
that is located on chromosome 2R which is otherwise 



populated with Classic members supporting their 
close relation to the Classic members as observed in the 
phylogeny of the individual genomes. The matype2 members, 
intriguingly, share orthology with corresponding OBPs 
from A. gambiae that were mapped to chromosome X, 
whereas matype3 and matype4 members were sharing 
orthology with AgamOBPs distributed over chromosomes 
3R and 3L. 
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Discussion 

Evolutionary Aspects of OBP Gene Family in Mosquitoes 

C. quinquefaciatus (Arensburger et al. 201 0) and Aed. aegypti 
genomes (Nene et al. 2007) which code for 109 and 111 
OBPs, respectively, have a significantly larger OBP genes rep- 
ertoire than the related A. gambiae genome which harbors 
only 67 OBPs. A. gambiae belongs to the Anophelinae and 
Aed. aegypti and C. quinquefasciatus belong to the Culicinae 
subfamilies. These two subfamilies of mosquitoes are esti- 
mated to have diverged around ~ 1 20 My (Reidenbach et al. 
2009). The increase in the number of genes indicates 
lineage-specific expansions of the OBP gene family among 
the mosquito species. The odorant binding gene family has 
been previously shown to adopt a birth and death model of 
evolution based on a number of factors which includes several 
gene gain and loss events in lineages, decrease in the number 
of orthology groups with increasing divergence times, and an 
uneven phylogenetic subfamily distribution across species 
(Vieira and Rozas 201 1). Similar observations are made with 
respect to the OBP genes in Anopheles, Culex, and Aedes 
species, where a number of gene gain is observed in the 
Aedes and Culex species, uneven distribution of subfamilies 
is observed (where the MinusC Subfamily is absent in 
Anopheles), and the number of orthologous sequences are 
higher in the Aedes and Culex species and becomes lesser 
with respect to the Anopheles which is distantly related to 
these two species comparatively. This provides further support 
to the already existing fact that the OBP gene family under- 
goes a birth and death model of evolution. Furthermore, the 
appearance of new genes and subfamilies in the Aedes and 
Culex could relate to the requirement of these genes for en- 
vironmental adaptations by these species. 

Evolution of MinusC Proteins in the Mosquito Genomes 

The MinusC subfamily of OBPs was first identified in the 
Drosophila genome with some of its members lacking the 
second and fifth cysteine residues (Hekmat-Scafe et al. 
2002) and later identified in other species which includes 
the Apis millefera (Foret and Maleszka 2006) and B. mori 
(Gong et al. 2009 and Yoshizawa et al. 2011). In the case 
of mosquitoes, interestingly, MinusC OBPs are not present in 
the A. gambiae. But the MinusC OBPs appeared in Culicinae 
lineage of the mosquito OBPs. The close homology of these 
MinusC members with the B. mori MinusC OBPs suggests that 
they could have a common ancestor. Thus, it can be said that 
the MinusC OBPS appeared in the Endopterygota lineage of 
insects which is dated back to ~300 My and suggest that 
these OBPs appeared earlier in the evolution and not only in 
the Drosophilidae, BombyxlTribolium, and Apis lineages as 
believed earlier (Vieira and Rozas 2011). However, the ab- 
sence of these OBPs in the A. gambiae is intriguing and sug- 
gests that they could have species-specific expansions. This in 



fact supports the birth and death model of evolution observed 
in the OBP family of proteins. Separately, the matype2 mem- 
bers belonging to the two-domain OBP subfamily, which 
retain only six cysteines, interestingly lack the C2 and C5 cyst- 
eines in the N-terminal domain. The absence of C2 and C5 
cysteines being the characteristic feature of the MinusC pro- 
teins lays an important question on the evolutionary link be- 
tween these members. 

Atypical OBPs are Indeed Two-Domain OBPs 

The increase in the number of Atypical OBPs in the three 
mosquito genome revealed important facets in this subfamily 
of proteins. We have shown that the Atypical OBPs so far 
identified only in the mosquitoes are indeed two-domain 
OBPs. This was also reported by Vieira and Rozas (2011), 
based on their phylogenetic analysis that they belong to the 
dimer OBP clade. We further provide evidence to this by char- 
acterizing each of the Atypical OBP domains by their closest 
homologue in the Classic OBP subfamily in their correspond- 
ing genomes. Very interestingly, the Classic OBP members, 
obtained as hits by each of these domains, were mainly 
found among the mdassic9, mdassic8, and Obp99a members 
(table 2). Atypical OBPs indeed are found to share closer 
phylogenetic proximity to OBPs from the mdassic9, mdassic8, 
and OBP99a phylogenetic clusters however not with 
significant bootstrap value because of sequence divergence 
(supplementary fig. S2a-c, Supplementary Material online). 
Moreover, Atypical gene clusters in A. gambiae are localized 
in close proximity to gene clusters that contains Classic OBPs 
from one of these three groups at the chromosomal level. 
On chromosome 2R, matypel members AgamOBP39 and 
AgamOBP40 are localized at the level of the same gene cluster 
as the mdassic9 members AgamOBP11, AgamOBP12, and 
AgamOBP14. Likewise, on chromosome X, the matypel 
OBPs AgamOBP34 to AgamOBP37 are localized proximal to 
AgamOBP8 and AgamOBP9 that belong to the Obp99a 
phylogenetic cluster. Similarly, on chromosome 3L, the 
matype3 AgamOBP31, AgamOBP44, and AgamOBP45 that 
form a gene cluster are in close proximity to AgamOBP22 
which belongs to the mdassic8 group. Another interesting 
observation is that OBP members from these three phylogen- 
etic clusters (mdassic9, mdassic8, and Obp99a) are closely 
related to the MinusC group of proteins in the Drosophila 
genome (fig. 4), and it has been established that the 
Drosophila Dimer OBPs 83cd and 83ef (Zhou et al. 2004), 
which are proteins that hold two OBP domains, are related 
to these Drosophilidae MinusC proteins. 

The recent publication of a functional dimer in the C. quin- 
quefasdatus genome (Mao et al. 2010) supports the current 
important speculations on Atypical members, indicating the 
importance of the presence of two-domain proteins in 
the binding of relatively large ligands. Thus, it is confirmed 
that the Atypical OBP members are indeed two-domain 
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Table 2 

Analysis of the Two Putative OBP Domains (N-term and C-term) of Atypical OBPs from Anopheles gambiae, Aedes Aegypti, and Culex 
quinquefasciatus 



Mosquito Atypical OBP Mosquito Classic OBP Closest Homologues Drosophila OBP Closest Homologues 



ID 


Phylogenetic 
Subgroup 


N-term 


Phylogenetic f-value 
Subgroup 


C-term 


Phylogenetic f-value 
Subgroup 


N-term 


f-value 


C-term 


f-value 


AAEL009597 
AaegOBP40 


maty pel 


AAEL005772 
AaegOBP22 


Obp99a 


1e-10 


AAEL004342 
AaegOBP18 


mclassic9a 


2e-14 


Obp99b 


5e-1 1 


Obp99a 


5e-09 


AAEL009599 
AaegOBP41 


maty pel 


AAEL005772 
AaegOBP22 


Obp99a 


1e-10 


AAEL007014 
AaegOBP79 


No group 


2e-04 


Obp99a 


9e-08 


Obp99a 


8e-04 


AGAP002190 
AgamOBP39 


maty pel 


AGAP000278 
AgamOBP9 


OBP99a 


1e-10 


AGAP002189 
AgamOBP14 


mdassic9b 


2e-15 


Obp99b 


6e-09 


Obp99a 


5e-09 


AGAP002191 
AgamOBP40 


maty pel 


AGAP002188 
AgamOBP12 


No group 


2e-08 








Obp44a 


7e-07 


Obp99a 


5e-08 


AGAP011647 
AgamOBP30 


maty pel 


AGAP0 10409 
AgamOBP22 


mclassic8 


2e-11 


AGAP002025 
AgamOBP1 1 


mclassic9b 


5e-09 


Obp99a 


7e-09 


Obp99a 


4e-03 


CPIJ015732 
CquiOBP85 


maty pel 


CPIJ010787 
CquiOBP51 


mclassic9a 


2e-10 


CPIJ0 16343 
CquiOBP63 


mclassic9b 


2e-17 


Obp99b 


4e-07 


Obp99a 


9e-10 


CPIJ015733 
CquiOBP86 


maty pel 


CPIJ010787 
CquiOBP51 


mclassic9a 


1e-10 


CPIJ0 10782 
CquiOBP46 


maclassic9b 


4e-04 


Obp44a 


3e-07 


Obp99a 


3e-07 


AAEL000318 
AaegOBP92 


matype2 


AAEL007003 
AaegOBP80 


No group 


5e-07 


AAEL004342 
AaegOBP18 


mclassic9a 


1e-03 


Obp44a 


2e-04 


Obp99c 


1e-04 


AAEL000319 
AaegOBP93 


matype2 


AAEL002617 
AaegOBP12 


mdassic3a 


2e-03 


AAEL007014 
AaegOBP79 


No group 


3e-04 


Obp44a 


4e-03 






AAEL000344 
AaegOBP94 


matype2 


AAEL007003 
AaegOBP80 


No group 


1e-04 


AAEL007014 
AaegOBP79 


No group 


4e-02 


Obp44a 


4e-06 


Obp99b 


3e-03 


AAEL000350 
AaegOBP95 


matype2 


AAEL011730 
AaegOBP81 


mclassic8 


3e-05 


AAEL002587 
AaegOBP11 


mdassic3b 


1e-04 


Obp56d 


2e-03 


Obp99b 


5e-07 


AAEL000377 
AaegOBP89 


matype2 


AAEL004343 
AaegOBP19 


mclassic9a 


4e-06 


AAEL007014 
AaegOBP79 


No group 


1e-03 


Obp44a 


3e-09 


Obp99b 


7e-04 


AAEL001 1 53 
AaegOBP106 


matype2 


AAEL007003 
AaegOBP80 


No group 


7e-05 


AAEL013018 
AaegOBP3 


OS-E/OS-F 


3e-02 


Obp99c 


1e-05 






AAEL001174 
AaegOBP98 


matype2 


AAEL004343 
AaegOBP19 


mclassic9a 


4e-07 


AAEL004343 
AaegOBP19 


mclassic9a 


9e-07 


Obp44a 


7e-05 


Obp44a 


3e-05 


AAEL001179 
AaegOBP99 


matype2 


AAEL004343 
AaegOBP19 


mclassic9a 


8e-08 


AAEL007014 
AaegOBP79 


No group 


1e-02 


Obp99b 


5e-07 


Obp44a 


8e-05 


AAEL001189 
AaegOBP105 


matype2 


AAEL004343 
AaegOBP19 


mclassic9a 


8e-07 


AAEL007003 
AaegOBP80 


No group 


8e-04 


Obp44a 


1e-04 


Obp44a 


2e-03 


AAEL004516 
AaegOBP104 


matype2 


AAEL004343 
AaegOBP19 


mclassic9a 


2e-04 








Obp44a 


4e-04 






AAEL009433 
AaegOBP109 


matype2 


AAEL004343 
AaegOBP19 


mclassic9a 


2e-06 








Obp99c 


0.002 






AAEL013719 
AegOBP90 


matype2 








AAEL004343 
AaegOBP19 


mclassic9a 


1e-03 










AAEL013720 
AaegOBP91 


matype2 


AAEL007003 
AaegOBP80 


No group 


2e-06 


AAEL004343 
AaegOBP19 


mclassic9a 


1e-04 


Obp44a 


1e-03 






AAEL014874 
AaegOBP108 


matype2 


AAEL004343 
AaegOBP19 


mclassic9a 


2e-06 








OBP99c 


2e-03 






AAEL014876 
AaegOBP107 


matype2 


AAEL011730 
AaegOBP81 


mclassic8 


3e-09 








Obp99c 


1e-05 







(continued) 
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Table 2 Continued 



Mosquito Atypical OBP Mosquito Classic OBP Closest Homologues Drosophila OBP Closest Homologues 



ID 


Phylogenetic 
Subgroup 


N-term 


Phylogenetic f-value 
Subgroup 


C-term 


Phylogenetic f-value 
Subgroup 


N-term 


f-value 


C-term 


f-value 


AGAP000641/644 

/-\ycn i luuijiij / 


matype2 


AGAP013182 

/-\yai i lUDrjj 


ND 


3e-09 


AGAP002025 

AnamORPI 1 
rtyal I lUDr I I 


mclassic9b 


4e-10 


Pbprp2 


1e-04 


Pbprpl 


5e-04 


AGAP000642 

rtydi I ivDr jj 


matype2 


AGAP013182 

AnamORP'iQ 


ND 


2e-08 


AGAP002025 

AnamORPI 1 
MLjal I IL/Dr I 1 


mclassic9b 


2e-06 


Obp56d 


1e-05 






AGAP000643 

MUul 1 IUUijU 


matype2 


AGAP013182 

AnamORP'iQ 
Myai i iWDrjj 


ND 


8e-09 


AGAP002025 

AnamORPI 1 
Myai I iWDr I I 


mclassic9b 


2e-06 


Obp56d 


1e-05 






CPIJ001690 
("nuiORPQ? 

V_U LI l\_/D r -7Z. 


matype2 


CPIJ009937 

C ni liORPAA 

v_ Lj LI 1 W 0 1' I' 1 


mclassic8 


2e-07 








Obp99a 


4e-07 






CPIJ003863 

C ni liORPRQ 

v_ L] U 1 V_/ DrOj 


matype2 


CPIJ009937 

C m liORPAA 

v_ Lj Li 1 W D 1 1 1' 1 


mclassic8 


2e-05 


CPIJ0 10782 

C m liORPAfi 

V_ L| Li 1 W D l *+L) 


maclassic9b 


4e-02 


Obp44a 


4e-07 






CPIJ003865 

L_ LI LI 1 w U 1 <J o 


matype2 


CPIJ009937 
f" nuiORPAA 

v_ LI LI 1 L-/ LJ 1 * 1" 1 


mclassic8 


4e-06 


CPIJ0 10782 
CniiiORPAfi 

LU LI 1 Ly L> r "-rLJ 


maclassic9b 


2e-02 


Obp44a 


7e-07 






CPIJ003866 

t~ ni liORPQO 


matype2 


CPIJ009937 

i~ m liORPAA 

V. Lj U 1 W D 1 ' I' 1 


mclassic8 


3e-07 


CPIJ0 10782 
Cm liORPAfi 


maclassic9b 


1e-04 


Obp44a 


1e-09 


Obp99b 


1e-03 


CPIJ003867 

C ni iiORPQ1 

v_ L] U 1 W D r J 1 


matype2 


CPIJ009937 

i~ m liORPAA 

V. Lj U 1 W D 1 ' 1' 1 


mclassic8 


1e-09 


CPIJ0 10782 
Cm liORPAfi 


maclassic9b 


4e-02 


OBP99a 


9e-07 






CPIJ017163 

t~ ni iiORPQQ 

v_ L] U 1 W D r J J 


matype2 


CPIJ009937 

i~ m liORPAA 

V. Lj U 1 W D 1 ' I' 1 


mclassic8 


2e-06 


CPIJ0 17326 
Cm liORPAA 

LL|UIUDrtJ 


Obp99a 


6e-02 


Obp44a 


4e-04 


Obp56g 


4e-03 


CPIJ017164 

C ni iiORPQ7 
v_ L] u I w Dr J; 


matype2 


CPIJ009937 

C ni liORPAA 

V. Lj U 1 W D 1 ' 1' 1 


mclassic8 


6e-06 


CPIJ0 10789 
rnuiORP 1 ^ 


mclassic7 


2e-02 


Obp44a 


2e-04 






CPIJ017165 

t~ m liORPQR 


matype2 


CPIJ009937 

C ni liORPAA 

V_L1 U 1 W D 1 1 1' 1 


mclassic8 


7e-10 


CPIJ0 10782 

C m liORPAfi 

L. L| Ll 1 W D l *+L) 


maclassic9b 


1e-02 


Obp99c 


2e-04 






CPIJ017166 

v_ L] LI I v_/ Dr JJ 


matype2 


CPIJ009937 

C ni liORPAA 

V_L1 U 1 W D 1 1 1' 1 


mclassic8 


1e-03 


CPIJ0 16343 
Cm liORPfi^ 


mclassic9b 


3e-02 


Obp44a 


1e-04 






CPIJ017167 

V_LI LI 1 V_/ Dr jU 


matype2 


CPIJ009937 

i~ ni liORPAA 


mclassic8 


1e-03 


CPIJ001365 

C m iiORP7 


Pbprpl 


1e-02 


Obp44a 


4e-04 


Obp56d 


7e-05 


CPIJ017168 

rnniORPim 
v_ l] li i \J d r i \j i 


matype2 


CPIJ016951 
rnuiORP^7 

L-Lj LI lUDr J / 


Obp19a 


6e-04 


CPIJ0 16343 
Cm liORPfi^ 

V„LjLII UDl LI J 


mclassic9b 


3e-02 










CPIJ017169 
CnuiOBPlOO 

L_ LI LI 1 V-/ U T IW 


matype2 


CPIJ009937 
C niiiORPAA 

L_ Lj LI 1 L-/ 1_J r I t 


mclassic8 


8e-03 


CPIJ0 10789 

L~LI LI 1 V— / U 1 _JJ 


mclassic7 


2e-02 


Obp44a 


3e-03 


Obp99b 


2e-03 


CPIJ017170 

i~ ni liORPQA 


matype2 


CPIJ009937 

C ni liORPAA 

V. L| U 1 W D r * 1 1 


mclassic8 


3e-02 


CPIJ0 16343 
f~niiiORPfi3 


mclassic9b 


6e-03 


Obp44a 


6e-03 






AAEL006385 
AapnDRP^ 


matype3 


AAEL002596 
AAFnORPQ 


mclassic3a 


8e-04 


AAEL004343 

AapnORPI Q 
nacyuur I J 


mclassic9a 


2e-06 


Obp56d 


8e-04 


Obp99a 


9e-08 


AAEL006387 

AaonORP^Q 


matype3 


AAEL002617 

AapnORPI ~) 


mclassic3a 


1e-05 


AAEL004343 

AapnORPI Q 


mclassic9a 


3e-06 


Obp56d 


3e-05 


Obp99a 


5e-07 


AAEL006393 

AafiriORP9R 


matype3 


AAEL002617 

AapnORPI ~) 


mclassic3a 


1e-05 


AAEL004343 

AapnORPI Q 


mclassic9a 


3e-06 


Obp56d 


3e-05 


Obp99a 


5e-07 


AAEL006396 
AaegOBP31 


matype3 


AAEL002617 
AaegOBP12 


mclassic3a 


1e-05 


AAEL004342 
AaegOBP18 


mclassic9a 


2e-03 


Obp56d 


1e-05 


Obp99a 


4e-05 


AAEL006398 
AaegOBP32 


matype3 


AAEL011730 
AaegOBP81 


mclassic8 


2e-02 


AAEL011730 
AaegOBP81 


mclassic8 


1e-06 


Obp56d 


9e-03 


Obp99a 


1e-06 


AGAP010648 
AgamOBP44 


matype3 


AGAP002025 
AgamOBP1 1 


mclassic9b 


1e-10 


AGAP002025 
AgamOBP1 1 


mclassic9b 


1e-08 


Obp99a 


2e-05 


Obp99b 


5e-04 


AGAP010649 
AgamOBP31 


matype3 


AGAP013182 
AgamOBP59 


ND 


5e-09 


AGAP002025 
AgamOBP1 1 


mclassic9b 


1e-12 


Obp99a 


8e-07 


Obp99b 


6e-08 


AGAP010650 
AgamOBP45 


matype3 


AGAP013182 
AgamOBP59 


ND 


1e-11 


AGAP002189 
AgamOBP14 


mclassic9b 


3e-11 


Obp99b 


9e-05 


Obp99a 


3e-10 
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Table 2 Continued 



Mosquito Atypical OBP Mosquito Classic OBP Closest Homologues Drosophila OBP Closest Homologues 



ID 


Phylogenetic 
Subgroup 


N-term 


Phylogenetic f-value 
Subgroup 


C-term 


Phylogenetic f-value 
Subgroup 


N-term 


f-value 


C-term 


f-value 


CPIJ009038 
rnniORPR7 


matype3 


CPIJ009937 

C ni liORPAA 

v_ L) U 1 «_/ O r *t*t 


mclassic8 


9e-08 


CPIJ009937 

C ni liORPAA 


mclassic8 


7e-08 


Obp56d 


2e-03 


Obp99a 


1e-10 


CPIJ017342 

v_ L] U 1 V_/ Dr JJ 


matype3 


CPIJ009937 

C ni liORPAA 

v_ L) VJ 1 W D r *T*T 


mclassic8 


2e-08 


CPIJ006551 
Cm iiORP1 1 


Obp19a 


2e-05 


Obp56c 


7e-03 


Obp99b 


1e-08 


AAEL000796 

nacUUDrjU 


matype4 


AAEL011730 

AapnORPRI 
nacyuDro i 


mclassic8 


8e-04 


AAEL004343 

AapnDRPI Q 
nacUUDr 1 J 


mclassic9a 


6e-07 






Obp56i 


1e-05 


AAEL000821 


matype4 


AAEL011730 
AapnDRPRI 


mclassic8 


1e-05 


AAEL005770 
AapnDRP?1 


Obp99a 


1e-06 










AAEL000827 


matype4 


AAEL007014 
AapnDRP7Q 


No group 


4e-03 


AAEL004342 
AapnORPIR 


mclassic9a 


4e-05 






Obp99a 


5e-05 


AAEL000831 

AapnORPR^ 
nacUUDrOJ 


matype4 


AAEL011730 

AapnORPRI 
MdcyuDro I 


mclassic8 


1e-02 


AAEL002596 

A^phORPQ 


mclassic3a 


3e-05 






Obp56g 


1e-04 


AAEL000833 

nacuUDr / 


matype4 


AAEL004339 

nacUUDr 1 / 


mclassic7 


4e-03 


AAEL004343 

AapnORPI Q 


mclassic9a 


8e-05 






Obp99d 


5e-05 


AAEL000835 

AapnnRPQ7 


matype4 


AAEL011730 

AapnORPRI 
nacUUDro 1 


mclassic8 


8e-03 


AAEL004343 

AapnORPI Q 


mclassic9a 


6e-07 






Obp56i 


1e-05 


AAEL000837 
AapaOBP1 12 


matype4 








AAEL011730 
AapnDRPRI 


mclassic8 


6e-08 






Pbprp2 


7e-04 


AAEL001487 
AsphORPI 14 


matype4 


AAEL011730 

AapnORPRI 
nacyuDro I 


mclassic8 


4e-03 








Obp51a 


1e-02 






AAEL003311 
AsphORPI 1 1 


matype4 


AAEL011730 
AapnDRPRI 

nacUUDro 1 


mclassic8 


7e-05 


AAEL002596 

A^phORPQ 


mclassic3a 


2e-04 






Obp99b 


1e-03 


AAEL003315 

nacUUDr lu 


matype4 


AAEL011730 
AapnDRPRI 


mclassic8 


2e-08 


AAEL005770 

AapnORP71 
nacyuDro i 


Obp99a 


2e-04 






Obp99c 


4e-03 


AAEL003511 
AspnORPR7 


matype4 


AAEL011730 

A^pnORPRI 
nacyuDro i 


mclassic8 


2e-04 


AAEL004343 

AapnDRPI Q 
nacUUDr 1 J 


mclassic9a 


1e-06 






Obp99a 


2e-05 


AAEL003513 
AapnORPIOn 


matype4 


AAEL011730 
AapnORPRI 


mclassic8 


5e-07 


AAEL005770 
AapaOBP21 


Obp99a 


1e-07 






Obp99a 


2e-05 


AAEL003525 

AAa CUUDr I vj I 


matype4 


AAEL011730 

AapnORPRI 
nacUUDru 1 


mclassic8 


2e-03 


AAEL005770 

AapnORP71 
nacyuurt I 


Obp99a 


4e-07 






Obp99a 


2e-04 


AAEL003538 

A^pnORPI C\~) 


matype4 


AAEL011730 

AapnORPRI 
nacyuDro I 


mclassic8 


5e-07 


AAEL005770 

AapnORP? 1 
MacyUDrt I 


Obp99a 


1e-07 






OBP99a 


2e-05 


AAEL004856 
AapnORPRfi 


matype4 


AAEL011730 

AapnORPRI 
nacUUDro 1 


mclassic8 


3e-05 


AAEL007014 

AaprfORP7Q 


No group 


6e-06 






Obp99a 


1e-04 


AAEL010714 
AapnDRPAS 


matype4 


AAEL011730 
AapnDRPRI 


mclassic8 


3e-05 


AAEL005770 
Aapn(~)RP?1 


Obp99a 


2e-07 






Obp99a 


8e-06 


AAEL010718 

A^pnORPAA 


matype4 


AAEL011730 
A^pnDRPRI 

nacUUDro 1 


mclassic8 


5e-07 


AAEL005770 

A^pnORP71 


Obp99a 


2e-06 


Obp56g 


6e-03 


Obp99a 


1e-04 


AAEL010872 
AaegOBP46 


matype4 


AAEL011730 
Aaeg0BP81 


mclassic8 


9e-05 


AAEL004342 
AaegOBP18 


mclassic9a 


5e-05 






Pbprp5 


8e-04 


AAEL010874 
AaegOBP88 


matype4 


AAEL011730 
AaegOBP81 


mclassic8 


3e-06 


AAEL005770 
AaegOBP21 


Obp99a 


3e-06 






Obp99b 


3e-05 


AAEL010875 
AaegOBP103 


matype4 


AAEL011730 
Aaeg0BP81 


mclassic8 


3e-06 


AAEL005770 
AaegOBP21 


Obp99a 


2e-05 






Obp99d 


7e-05 


CPIJ000653 
CquiOBP84 


matype4 








CPIJ0 16343 
CquiOBP63 


maclassic9b 


2e-06 






Obp99b 


3e-04 


CPIJ0081 54 
CquiOBP83 


matype4 


CPIJ014525 
CquiOBP24 


maclassic6 


1e-02 


CPUO 16343 
CquiOBP63 


mdassic9b 


7e-05 
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Table 2 Continued 



Mosquito Atypical OBP 



Mosquito Classic OBP Closest Homologues 



Drosophila OBP Closest Homologues 



ID 


Phylogenetic 


N-term 


Phylogenetic f-value 


C-term 


Phylogenetic f-value 


N-term 


f-value 


C-term 


f-value 




Subgroup 




Subgroup 






Subgroup 












CPIJ008155 
rnniORP7Q 

v_ L] LI 1 V_/ Dr / 3 


matype4 


CPIJ010789 


mclassic7 


2e-03 


CPIJ0 16343 


mclassic9b 


1e-06 






Obp99a 


2e-04 


CPIJ0081 56 

C ni liORPRO 

v_ L] U 1 V_/ D r O KJ 


matype4 


CPIJ010789 

V_\Jj Li 1 W Dl J J 


mclassic7 


4e-02 


CPIJ0 10782 

V_ L| Li 1 W D l *+VJ 


maclassic9b 


2e-11 






Obp99a 


2e-08 


CPIJ008157 

V-LI LI 1 V_/ Dr/u 


matype4 


CPIJ010787 
rnuiORP^I 


mclassic9a 


1e-02 


CPIJ0 10782 

C ni liORPAfi 

V. L| Li 1 W D l *+VJ 


maclassic9b 


1e-09 






Obp99a 


6e-08 


CPIJ0081 58 
CnuiOBP77 

v_ Li Li 1 w ur ; / 


matype4 


CPIJ016343 
CnuiOBP63 

v_ Li Li i Ly u r <j _> 


mdassic9b 


1e-08 


CPIJ0 16343 
CniiiORPfi^ 


mclassic9b 


8e-07 


Obp99a 


1e-05 


Obp99a 


3e-08 


CPIJ0081 59 
CnuiOBP78 

v_ Li Li 1 \j ur ; U 


matype4 


CPIJ009937 

L_ Li Li 1 KJ L> 1 1 1 


mclassic8 


6e-03 


CPIJ0 16343 
Cm liORPfi^ 


mdassic9b 


4e-10 






Obp99a 


6e-09 


CPIJ008160 
rnniORPRI 

V-LI LI 1 W DrOI 


matype4 


CPIJ009937 

C m iiORP^A 

V-VJJ Lt 1 W D r * t*T 


mclassic8 


1e-02 


CPIJ0 16343 
(~nniORPfi3 

V_LjUIVJDr UJ 


mclassic9b 


1e-09 






Obp44a 


3e-03 


CPIJ008161 

V-LI LI 1 \J o r o z. 


matype4 


CPIJ010789 

V-VJJ Li 1 W Dr JJ 


mclassic7 


2e-02 


CPIJ0 16343 
Cn\ iiORPfi^ 

V_LjUIVJDr UJ 


mclassic9b 


2e-09 






Obp99b 


3e-06 


AAEL008640 

Mdcyuur I I j 




AAEL011730 

AapnDRPRI 
nacUUDru 1 


mclassic8 


2e-08 


AAEL011730 

Macuuuro I 


mclassic8 


1e-05 










AAEL014430 
AapaOBPSS 




AAEL007003 
AapnDRPRO 


No group 


2e-08 


AAEL011730 
AapnORPRI 


mclassic8 


2e-05 


Obp99c 


3e-07 


Obp44a 


1e-06 


AAEL014431 
AaegOBP1 10 




AAEL011730 
AaeguBPol 


mclassic8 


4e-12 


AAEL004342 
AaegOBP18 


mclassic9a 


9e-09 


Obp99b 


1e-07 


Obp99b 


3e-03 


AGAP000580 
AgamOBP38 




AGAP002189 
AgamOBP14 


mclassic9b 


2e-06 


AGAP002025 
AgamOBP1 1 


mclassic9b 


4e-05 


Obp99b 


6e-05 


Obp99c 


2e-06 


AGAP000638 
AgamUDroi 




AGAP0 10409 
AgamuBrZz 


mclassic8 


2e-11 


AGAP002025 
Agamubr I I 


mclassic9b 


8e-07 


Obp99a 


1e-06 


Obp99c 


1e-03 


AGAP000640 
AgamOBP33 




AGAP0 10409 
AgamOBP22 


mclassic8 


2e-11 


AGAP002025 
AgamOBP1 1 


mclassic9b 


8e-07 


Obp99a 


1e-06 


Obp99c 


1e-03 


AGAP005182 
AgamOBP41 




AGAP013182 
AgamOBP59 


ND 


8e-07 


AGAP002025 
AgamOBP1 1 


mclassic9b 


7e-11 






Obp44a 


2e-04 


AGAP009065 
AgamOBP42 




AGAP013182 
AgamOBP59 


ND 


6e-10 


AGAP002025 
AgamOBP1 1 


mdassic9b 


2e-08 


Obp99a 


5e-05 


Obp99c 


7e-05 


AGAP009402 
AgamOBP43 




AGAP0 10409 
AgamOBP22 


mclassic8 


5e-11 


AGAP002189 
AgamOBP14 


mclassic9b 


6e-16 


Obp99a 


4e-06 


Obp99a 


2e-09 



Note. — The table shows top hits results of the BLAST search among all mosquito Classic OBPs and Drosophila OBPs after splitting the Atypical proteins into their two 
respective putative domains. 



OBPs which were previously observed in Drosophila as Dimer 
OBPs and that they no more stand specific to the mosquito 
genomes as reported earlier (Xu et al. 2003). Furthermore, the 
matype2 members which carry a presence of only 6 cysteines 
in the place of 1 2 cysteines as in the other two-domain OBPs is 
suggestive of a possible adaptation in the fold with 3 disul- 
phide bonds in place of 6 disulphide bonds in the other types. 
The astound distribution of these matype2 OBP genes from 
A. gambiae on the X chromosome further increases the 
speculative importance of these proteins in the blood feeding 
mechanism by female mosquitoes. Interestingly, most of the 
members of the two-domain OBP subfamily are reported as 
differentially expressed with respect to blood time series which 



adds to the importance of these proteins in host recognition 
(Dissanayake et al. 2010). 

Ecological adaptations might have driven the need for the 
observed expansion in two-domain OBP gene repertoire in the 
Aed. aegyptii and C. quinquefasciatus genome when com- 
pared with A. gambiae. Our observations indicate that this 
expansion most probably occurred through gene duplication 
events in localized genome regions which lead to the observed 
gene clusters. We hence hypothesize that two distinct mech- 
anisms could underlie the emergence of Atypical genes in 
mosquitoes. The observations made in A. gambiae genome 
sustain the first hypothesis that two-domain OBPs might have 
originated from gene duplicates of mclassic9, mdassic8, 
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or Obp99a related members and their subsequent gene fusion 
leading to Atypical genes coclustered with their Classic coun- 
terparts. The observations made in Aed. aegyptii and C. quin- 
quefasciatus support the second complementary hypothesis 
whereby the Atypical genes have undergone further gene 
duplications probably in response to ecological constraints in 
these mosquito lineages. 

Our analysis hence sustains the proposition that the 
Atypical OBP genes to be renamed two-domain OBP proteins. 
Their future structural characterization and ligand binding pro- 
filing would be of significant importance in deciphering their 
contribution in olfaction in mosquitoes. 

Materials and Methods 

Sequence Searches 

The database of the predicted protein sequences of the three 
mosquito genomes A gambiae (A gambiae annotation, 
AgamP3.4), Aed. aegypti (Aed. aegypti annotation, 
Aaegl_1.1), and C. quinquefasciatus (C. quinquefasciatus an- 
notation, CpipJ 1 .2) were downloaded from the VectorBase 
(Lawson et al. 2009) version 3.4 (http://www.vectorbase.org, 
last accessed January 9, 2013) and Ensembl Genomes 
(Hubbard et al. 2009). The putative OBPs in the three mos- 
quito species were identified using 10 Drosophila query se- 
quences which belong to three different subfamilies Classic/ 
General OBPs, PlusC, and MinusC OBPs using a PSI-BLAST 
(Altschul et al. 1997) run of 10 query sequences with an E- 
value cutoff of 3e~ 10 (Vieira et al. 2007) and an alignment 
length cutoff of 75% with respect to the query sequence. At 
this level, all of the previously identified members in the three 
genomes were identified with identification of a few add- 
itional members. A second run of PSI-BLAST was initiated 
with the hits from the previous runs. Using this protocol it 
was possible to not only pick up all the members of OBPs 
reported so far (Vogt 2002; Xu et al. 2003; Zhou et al. 
2004, 2008; Pelletier and Leal 2009, 2011; Vieira and Rozas 
201 1) but also a remarkable number of additional members. 
The additional sequences were checked for the presence of a 
signal peptide using the SignalP server (Petersen et al. 201 1), 
PBP/GOBP domain using CD-Search (Marchler-Bauer and 
Bryant 2004) in the case of classic OBPs, and alignment of 
the new sequences with their subfamily members in case of 
Atypical and PlusC proteins. The D7 proteins which were iden- 
tified using this method but which are considered as a distinct 
family of proteins related to the OBPs were also retained for 
further analysis and used as an outgroup in the construction of 
phylogenetic trees. The orthologous sequences were identi- 
fied based on the reciprocal best hit approach using BLAST 
(Moreno-Hagelsieb and Latimer 2008). The newly added se- 
quences were named according to the naming conventions 
used in the earlier reports (Vogt 2002; Xu et al. 2003; Zhou 
et al. 2004, 2008; Pelletier and Leal 2009). 



Multiple Sequence Alignment 

The multiple sequence alignment forms the basis for any ana- 
lysis of a family of proteins and it is highly necessary to obtain 
an accurate alignment. The error rate in the alignment in- 
creases with the increase in divergence of the proteins. 
Structure-based alignments in turn are considered to be the 
most accurate forms of alignments and hence, in this study, 
the structure alignment was used in constructing the align- 
ments. The structure alignment was constructed using 10 
OPBs in the OBP gene family using COMPARER (Sali and 
Blundell 1990). However, the use of the structure alignment 
as profiles was restricted to seven members in the case of 
OBPs and two members for the D7 family due to the limited 
number of structural data (data not shown). The OBPs and the 
D7 sequences were aligned to their respective structure align- 
ments as profiles, and a combined alignment of the two 
family of proteins was constructed using the profile-profile 
alignment option using ClustalX (Thompson et al. 1994, 
1997; Jeanmougin et al. 1998). The alignments were trun- 
cated based on the structure alignment on the N-terminal 
end which corresponds to the signal peptide region that has 
a high substitution rate; however, the C-terminal ends were 
retained due to the presence of an extended C-terminal in the 
case of Atypical subfamily members of the OBP family. This 
method was applied for aligning the sequences in all the three 
different genomes. Alignments for the different subclasses 
were constructed with sequences from all the three mosquito 
genomes and in the case of Classic subfamily, along with 
Drosophila sequences. The alignment of the Atypical and 
PlusC subclasses of OBPs were however not based on the 
structure alignment. 

Phylogenetic Analysis 

The phylogenetic trees were inferred using the Neighbor- 
Joining method (Saitou and Nei 1987) in MEGA 4.0 (Tamura 
et al. 2007). The percentage of replicate trees in which the 
associated sequences cluster together in the bootstrap test 
(1 000 replicates) are shown next to the branches of the boot- 
strap consensus trees (Felsenstein 1985) and branches with 
<50% bootstrap cutoff were collapsed. The evolutionary dis- 
tances were computed using the Poisson correction method 
(Zuckerkandl and Pauling 1965) and are in the units of 
number of amino acid substitutions per site. All positions con- 
taining alignment gaps and missing data were eliminated only 
in pairwise sequence comparisons (pairwise deletion option). 
The trees were rooted at the branches of the D7 family of 
proteins which was considered as an outgroup (supplemen- 
tary fig. S2a-c, Supplementary Material online). 

The trees of the different subclasses (figs. 4-6) used for the 
comparative analysis of the different genomes were analyzed 
as unrooted trees. The phylogenetic trees were inferred using 
the Neighbor-Joining method (Saitou and Nei 1987) in MEGA 
4.0 (Tamura et al. 2007). The percentage of replicate trees in 
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which the associated sequences cluster together in the boot- 
strap test (1 000 replicates) are shown next to the branches of 
the bootstrap consensus trees (Felsenstein 1985) and 
branches with <50% bootstrap cutoff were collapsed for 
the PlusC and Atypical OBP trees. The branches were not 
collapsed for the Classic OBP tree, however the subtype def- 
inition was still based on 50% bootstrap cutoff. 

Orthology, Paralogy, Chromosomal Mapping, and 
Tentative Syntenic Analysis 

OBP orthologs have been identified using the reciprocal BLAST 
hit approach (Moreno-Hagelsieb and Latimer 2008) which is 
widely used in the detection of orthologs. The inParanoid 
database (O'Brien et al. 2005) was used to examine the inpar- 
alogous relationship between OBPs. Assembled genome data 
was only available for A gambiae at the date of this work in 
the Ensembl Genome (Hubbard et al. 2009) and VectorBase 
(Lawson et al. 2009). The chromosomal locations of OBPs 
from A gambiae were identified using this data. The 
genome data of Aed. aegypti and C. quinquefasciatus as fea- 
tured todate in Ensembl Genomes and VectorBase are not yet 
assembled and were used to map the OBP genes in these 
genomes at the supercontigs level. The exact chromosomal 
locations are known for only about 1 0% of their supercontigs 
among which very few harbor OBP genes. Orthologous OBP 
genes identified as described above were used to establish 
putative synteny between chromosomal segments from 
A gambiae and supercontigs from the other two Cuiicinae 
species. The genes were mapped to their respective location 
on the chromosome or supercontigs (supplementary fig. 
S1a-e, Supplementary Material online). The chromosomes 
of A gambiae was used as reference and were represented 
as a yellow bar and the contigs of Aedes and Culex are rep- 
resented in purple and green, respectively. The direct 
three-way (1:1:1) orthology relationships among the three 
genomes are represented as green lines. The two-way (1:1) 
orthology relationships between two species are represented 
as black lines, and the inparalogy relationships are represented 
as red lines. The figures of the chromosomal mapping were 
drawn to scale using Adobe illustrator CS5. 

Atypical Domain Analysis 

The two constitutive PBP/GOBP OBP domains of Atypical OBPs 
were further characterized for their relationship with Classic or 
PlusC OBPs. For each Atypical OBP, the boundary between the 
N-term and C-term PGP/GOBP domains was manually de- 
limited. This was performed by subjecting the full-length se- 
quence to Pfam (Finn et al. 2010) and Conserved Domain 
Database (Marchler-Bauer et al. 201 1) and was further vali- 
dated by analyzing their cysteine profiles. Each N-term and 
C-term domain hence delimited was then subjected to a 
PSI-BLAST search (E-value cutoff value of 1 0~ 2 ) against a data- 
base that contains all OBPs from the same mosquito species 



in an attempt to find their putative distantly related 
single-domain OBPs. A similar search was performed against 
a database of Drosophiia OBPs. 

Supplementary Material 

Supplementary figures S1-S3 and tables S1-S4 are available 
at Genome Biology and Evolution online (http://www.gbe. 
oxfordjournals.org/). 
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